145 research outputs found

    Enabling Robots to Communicate their Objectives

    Full text link
    The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy. Our approach reduces this time by having the robot model how people infer objectives from observed behavior, and then it selects those behaviors that are maximally informative. The problem of computing a posterior over objectives from observed behavior is known as Inverse Reinforcement Learning (IRL), and has been applied to robots learning human objectives. We consider the problem where the roles of human and robot are swapped. Our main contribution is to recognize that unlike robots, humans will not be exact in their IRL inference. We thus introduce two factors to define candidate approximate-inference models for human learning in this setting, and analyze them in a user study in the autonomous driving domain. We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations. Our results also suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior.Comment: RSS 201

    Coherent Soft Imitation Learning

    Full text link
    Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilities fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

    Full text link
    Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives, or constraints, in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors when learning from experts or in offline RL. Often, task reward and auxiliary objectives are in conflict with each other and it is therefore natural to treat these examples as instances of multi-objective (MO) optimization problems. We study the principles underlying MORL and introduce a new algorithm, Distillation of a Mixture of Experts (DiME), that is intuitive and scale-invariant under some conditions. We highlight its strengths on standard MO benchmark problems and consider case studies in which we recast offline RL and learning from experts as MO problems. This leads to a natural algorithmic formulation that sheds light on the connection between existing approaches. For offline RL, we use the MO perspective to derive a simple algorithm, that optimizes for the standard RL objective plus a behavioral cloning term. This outperforms state-of-the-art on two established offline RL benchmarks

    CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome

    Get PDF
    An effective tool for the global analysis of both DNA methylation status and protein–chromatin interactions is a microarray constructed with sequences containing regulatory elements. One type of array suited for this purpose takes advantage of the strong association between CpG Islands (CGIs) and gene regulatory regions. We have obtained 20 736 clones from a CGI Library and used these to construct CGI arrays. The utility of this library requires proper annotation and assessment of the clones, including CpG content, genomic origin and proximity to neighboring genes. Alignment of clone sequences to the human genome (UCSC hg17) identified 9595 distinct genomic loci; 64% were defined by a single clone while the remaining 36% were represented by multiple, redundant clones. Approximately 68% of the loci were located near a transcription start site. The distribution of these loci covered all 23 chromosomes, with 63% overlapping a bioinformatically identified CGI. The high representation of genomic CGI in this rich collection of clones supports the utilization of microarrays produced with this library for the study of global epigenetic mechanisms and protein–chromatin interactions. A browsable database is available on-line to facilitate exploration of the CGIs in this library and their association with annotated genes or promoter elements
    • …
    corecore